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[57] ABSTRACT 

A sinusoidal model for acoustic waveforms is applied to 
develop anew analysis/synthesis technique which char- 
acterizes a waveform by the amplitudes, frequencies, 
and phases of component sine waves. These parameters 
are estimated from a short-time Fourier transform. 
Rapid changes in the highly-resolved spectral compo- 
nents are tracked using the concept of **birth" and 
"death" of the underlying sine waves. The component 
values are interpolated from one frame to the next to 
yield a respresentation that is appUed to a sine wave 
generator. The resulting synthetic waveform preserves 
the general waveform shape and is perceptually indis- 
tinguishable from the original. Furthermore, in the pres- 
ence of noise the perceptual characteristics of the wave- 
form as well as ^e noise are maintained. The method 
and devices are particularly useful in speech coding, 
time-scale modification, frequency scale modification 
and pitch modification. 
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sinusoidal model for the speech wavefann is used to 
PROCESSING OF ACOUSTIC WAVEFORMS develop a new analysis-synthesis technique. 

The basic method of the invention includes the steps 
The U.S. Govenmient has rights in this invention of: (a) selecting frames (Le. windows of about 20-40 
pursuant to the Department of the Air Force Contract 5 milliseconds) of samples from the waveform; (b) analyz- 
No. F19^8-8O-C-O0O2. ing each frame of samples to extract a set of frequency 

xprmjir AT ittot n components; (c) tracking the components from one 

1 iii.nrNi<.ALr r uu-u to the next; and (d) interpolating the values of the 

The field of this invention is speech technology gen- components from one frame to the next to obtain a 
erally and, in particular, methods and devices for ana- 10 parametric representation of the waveform. A synthetic 
lyzing, digitally-encoding, modifying and synthesizing waveform can then be constructed by generating a 
speech or other acoustic waveforms. series of sine waves corresponding to the parametric 

BACKGROUND OF THE INVENTION T^Se embodiment of the invention, a device 

Typically, the problem of representing speech signals 13 is disclosed which uses only the amplitudes and fre- 
is approached by using a speech production model in ' quendes of the component sine waves to represent the 
which speech is viewed as the result of passing a glottal waveform. In this so-called "magnitude-only** system, 
excitation waveform through a time-varying linear fil- phr^e continuity is maintained by defining the phase to 
ter that modeb the resonant characteristics of the vocai be the int^ral of the instantaneous frequency. In a more 
tract In many speech applications it suffices to assume 20 conprehensive embodiment, explicit use is made of the 
that the glottal excitation can be in one of two possible measured phases as well as Uic amplitudes and frequen* 
states corresponding to voiced or unvoiced speech. In cies of the components. 

the voiced speech state the excitation is periodic with a . The invention is particularly useful in speech coding 
period which is allowed to vary slowly over time rela- and time-scale modification and has been demonstrated 
tive to the analysis frame rate (typically 10-20 msecs>. 25 successfully in both of these applications. Robust de- 
For the unvoiced speech state the glottal excitation is vices can be built according to the invention to operate 
modelled as random noise with a flat spectrum. In both in environments of additive acoustic noise. The inven- 
cases the power level in the excitation is also considered tion also can be used to analyze single and multiple 
to be slowly time-varying. speaker signals, music or even biological sounds. The 

While this binary model has been used successfully to 30 invention will also find particular applications, for ex- 
design narrowband vocoders and speech synthesis sys- ample, in reading machines for the blind, in broadcast 
terns, its Itmitatians are well known. For example, often journalism editing and in transmission of music to re* 
the excitation is mixed having both voiced and un- n«ote players. 

voiced components simuHaneously, and often only por- In one illustrated embodiment of the invention, the 
tions of the spectrum are truly harmonic. Furthermore, 35 basic method summarized above is employed to choose 
the binary model requires that each frame of data be amplitudes, frequencies, and phases corresponding to 
classified as either voiced or unvcuced, a decision which the largest peaks in a periodogram of the .measured 
is particularly difficult to make if the speech is also si^ial, independendy of the speech state. In order to 
subject to additive acoustic noise. reconstruct the speech waveform, the amplitudes, fre- 

Speech coders at rates compatible with conventional 40 quencies, and phases of the sine waves estimated on one 
transmission lines (Le. 2*4-9.6 kilobits per second) frame are matched and allowed to continuously evolve 
would meet a substantial need. At such rates the binary into the corresponding parameter set on the successive 
model is ill-suited for coding applications. Additionally, frame. Because the number of estimated peaks are not 
speech processing devices and methods that allow the constant and slowly varying, the matching process is 
user to modify various parameters in reconstructing 45 not straightforward Rapidly varying regions of speech 
waveform would find substantial usage. For example, such as unvoiced/voiced transitions can result in large 
time-scale modification (without pitch alteration) changes in both the location and number of peaks. To 
would be a very useful feature for a variety of speech account for such rapid movements in spectral energy, 
applications (le. slowing down speech for translation the concept of **birtb" and **death" of sinusoidal compo- 
purposes or speeding it up for scanning purposes) as 30 nents is employed in a nearest-neighbor matching 
well as for musical coII^>osition or analysis, Unforto- method based on the frequencies estimated on each 
natdy, time-scale (and other parameter) modifications frame. If a new peak appears, a ''birth*' is said to occur 
also are not accomplished, with high quality by devices and a new track is initiated. If an old peak is not 
employing the bmary model. matched, a "death" said to occur and the corresponding 

Thus, there exists a need for better methods axid de- 55 track is allowed to decay to zera Once the parameters 
vices for processing audible waveforms. In particular, on successive frames have been matched, phase conti- 
speech coders operable at mid-band rates and in noisy nnity of <*grh sinusoidal component is ensured by un* 
environments as well as synthesizers capable of mam^ . wr^ping the phase. In one preferred embodiment the 
taining their perceptual quality of speech while chang- phase is unwrapped using, a cubic phase interpolation 
ing the rate of articulation would satisfy long-felt needs 60 function having parameter values that are chosen to 
and provide substantial contributions to the art. satisfy the measured phase and frequency constraints at 

ciTmriLrAiiv rkT?'rKn3 TKn/cKrrrrkia ^ frame boundaries while maintaining maximal 

SUMMARY OF THE INVENTION smoothness over the frame duration. FinaUy! the corre- 

It has been discovered that speech analysis and syn- spending sinusoidal amplitudes are simply interpolated 
thesb as well as coding and time-scale modification can 65 in a linear mannra' across each frame, 
be accomplished simply and effectively by employing a In speech coding applications, pitch estimates are 
time-frequency representation of ^e speech waveform used to establish a set of harmonic frequency bins to 
which is independent of the speech state. Specifically, a which the frequency components are as^gned. (Pitch is 
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used herein to mean the fundamental rate at which a Various coding techniques can also be used inter- 
speaker's vocal cords are vibrating). The amplitudes of changeably with those described below. Channel en- 
the components can be coded directly using, adaptive coding techniques are descnbed in J. N. Holmes, 'The 
pulse code modulation (ADPCM) across frequency or JSRU Channel Vocoder", Inst, of Electrical Eng. Pro- 
indirectly using linear predictive coding. In each bar- 5 ceedings (British), 27, 53-60 (1980). Adaptive pulse 
monic frequency bin the peak having the largest ampli- code modulation is described in L. R. Rabiner and R. 
tude is selected and assigned to the frequency at the W. Schafer Digital Processing of Signal^ (Prentice Hall 
center of the bin. This results in a harmonic series based 1978). Linear predictive coding is described by J. D. 
upon the coded pitch period. The phases can then be Markel, Linear Prediction of Speech^ (Springer-Verlog, 
coded by using the frequencies to predict phase at the 10 1967). These teachings are also incorporated by refer- 
end of the frame, unwrapping the measured phase with 

respect to this prediction and then coding the phase It should be i^>preciated that the tom "interpolation" 

residual using 4 bits per phase peak. If there are not ^ broadly in this application to encompass various 

enough bits available to code all of the phase peaks (e.g. techniques for filling in data values between those mea* 

for low-pitch speakers), phase tracks for the high fre- boundaries. In the magnitude-only 

quency peaks can be artificially generated. In one pre- system linear interpolatioa is employed to fill in ampli- 

ferrcd embodiment^ this is done by translating the fre- frequency values. In this simple system phase 

quency tracks of the base band peaks to the high fre- values are obtained by first defining a scries of instanta- 

quency of the uncoded phase peaks. This new coding frequency values by interpolating matched fre- 

scheme has the important property of adaptivdy alio- <l^«»cy components from one frame to the next and 

eating the bits for each speaker and hence is self-tuning integrating the series of instantaneous frequency 

to both low- and high-pitched speakers. Although pitch ^ ^^^^ * mterpolated phase values. In 

is used to provide side information for the coding algo- ™" comprehensive system the phase value of each 

rithm. the standard voicc^dtation model for speech is ™^ ^^7*^ directly and a cubic polynomial equa- 

not used. Tliis means that recourse is never made to a ^ tion preferably is employed to obtam maamally smooth 

voiced-unvoiced dedsion. As a consequence the mven- interpolations from frame to frame, 

tion is robust in noise and can be appUed at various data techniques that accomphsh the same purpose 

transmission rates simply by dumging the rules for the f ^^^^^ to m this appUcation as mterpolation 

bit aUocation, w u« techmques. For example, the so^ed "overlap and 

TTie invention is also wdl-suited for time-scale modi- ^ method of fillmg^ 

fication, which b accomplished by time-scaling^ ^^r!^''^^!,^ "^^^ overlappmg function «m be 

<.T««iit«H-« o«H «K«o*« ♦i.o* ^ ZZTZT^^ • apphed to the resuhmg sme waves generated during 

«nphtudes and phas^ such that the frequency vana- ^ overlapp^ values can be 

tions are preserved, m dm«>scale at which the speedi „^ to fill in the values betwe^ose m^S^ a^ 

IS played back is conttoUed simply by dumgmg the rate 35 ^he frame boundaries, 
at which the matched peaks are interpolated. This 

means that the time-scale can be speeded up or slowed BRIEF DESCRIPTION OF THE DRAWINGS 

down by any factor and tWs factor can be time-varying. HG. 1 is a schematic block diagram of one embodi- 

Tins rate can be controUed by a panel knob which al- ^^nt of the invention in which only the magnitude and 

lows an operator complete fiexibihty for va^ong the 40 frequendes of the components are used to r^nstruct a 

tune-scale. TTiere is no perceptual delay m performmg sampled waveform. 

thefame-scaling. ........ FIG- 2 is an illustration of the extracted amplitude 

The inv«ition will next be descnbed in connection and frequency components of a waveform sampled 

with certam illustrated embodmients. However, it accordmg to the present invention, 

shoddbeclttirthatvanous cl^^^ 45 HG. 3 is a general illustration of the frequency 

can be made by those skiUed m the art without depart- matching method of the present invention, 

mg from the spint and scope of the mvcntion. For ex- piGS. 4A-*F are detailed schematic illustrations of a 

ample other samphng techniques can be substituted for frequency matehing method according to the present 

the use of a variable frame length and Hamnung win- invention. 

dow. Moreover the length of such frames and windows 50 FIG. 5 is an illustration of tracked frequency compo- 

can vary m response to the particular application. Like- ncnts of an exemplary speech pattern, 

wise, frequency matching can be accomplished by vari- piG. 6 is a schematic block diagram of another em- 

oi» means. A vanety of commercial devices are avail- bodiment of the invention in which magnitude and 

able topcrform Founer analysis; such analysis can also phase of frequency components are used to reconstruct 

be performed by custom hardware or specially- 55 a sampled waveform, 

dwigned programs. FIG. 7 is an illustrative set of cubic phase interpola- 

Vanous tedmiques for extracting pitch information tion functions for smoothing the phase functions useful 

can be employed. For example, the pitch period can be in connection with the embodiment of FIG. 6 from 

derived from the Fourier transform. Other techniques which the "maximally smooth" phase function is se- 

such as the Gold-Malpass techniques can also be used. 60 lected. 

See gcncraUy, M. L. Malpass, 'Tlie Gold Pitch Detec- FIG. 8 is a schematic block diagram of another cm- 
tor m a Real Time Environment" Proc of EASCON bodiment of the invention particularly useful for time- 
1975 (Sept 1975); B. Gold, "Description of a Computer scale modification. 

Program for Pitch Detection", Fourth Internationa! FIG. 9 is a schematic block diagram showing an 

Congress on Acoustics, Copenhagai Aug. 21-28, 1962 65 embodiment of the system estimation function of FIG. 

and B. Gold, "Note on Buzz-Hiss Detection", / Acoust 8. 

Soc Amer. 365. 1659-1661 (1964), all incorporated FIG. 10 is a block diagram of one real-time imple- 

herein by reference. mmtation of the invention. 
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DETAILED DESCRIPTION 

In the present invention the speech waveform is mod- 
elled as a smn of sine waves. If s(n) represents the sam- 
pled speech waveform then 



(1) 



where ai(n) and (^Xn) are time-varying amplitudes and 
phases of the i*th tone. 

In a simple embodiment the phase can be defined to 
be the integral of the mstantaneous frequency C<n) and 
therefore satisfies the recursion 



10 



(2) 



IS 



where & is the Sampling frequency. If the tones are 
harmonically related, then 



frequency ordered assignment of peaks. In practice, 
however, there will be spurious peaks that come and go 
due to the effects of sidelobe interaction; the locations 
of the peaks will change as the pitch change^ and there 
will be rapid changes in both the location and the num- 
ber of peaks corresponding to rapidly-varying regions 
of speech, such as at voiced/unvoiced transitions. In 
order to account for such rapid movements in the spec- 
tral peaks, the present invention employs the concept of 
**buth'' and "death" of sinusoidal con^xments as part of 
the matching process. 

The matching process is further explamed by consid- 
eration of FIG, 4. Assume that peaks up to frame k have 
been matched and a new parameter set for frame k+ 1 is 
generated. Let the chosen frexiuendes on frames k and 

o)^.i^and ft>o*=i, 



k-f 1 be denoted by 



(3) 



1 by coc 

ft)i*=S . . , a»Af-i*=' respectivdyp where N and M 
represent the total number of peaks selected on each 
frame (N^M in general). One process of matchina each 
20 frequency m f^rame k, OnK to some frequency m frame 
k+1, fi>iii*+^ is given in the following three steps. 

Step 1 

Suppose that a match has been found for frequencies 



where fo(n) represents the fundamental frequency at 
time n. One particularly attractive property of the 
above model b the fact that phase condnui^, hence 
waveform continuity, is guaranteed as a consequence of 

the definition of phase in terms of the instantaneous 25 ^^Jc^^^k^ , , oa-i^. A match is now attempted'for fre- 



frequency. This means that waveform reconstruction is 
possible from the *'magxiitude-only" spectrum since a 
high-resolution spectral analysis reveals the amplitudes 
and frequencies of the component sine waves. 

A block diagram of an analysis/synthesis system ac- 
cording to the invention is illustrated in FIG. L As 
shown in FIG. 1, system 10 includes sampling window 
11, a discrete Fourier transform (DFT) analyzer 12, 
magnitude computer 13, a frequency amplitude estima- 
tor 14, and an optional coder 16 in the transmitter seg- 
ment and a frequency matching means 18, an interpola- 
tor 20 and a sine wave generator 22 in the receiver 
segment of the system. The peaks of the magnitude of 
the discrete Fourier transform (DFT) of a windowed 
waveform are found simply by determining the loca- 40 
dons of a change in slope (concave down). In addition, 
the total number of peaks can be limited and this limit 
can be adapted to the expected average pitch of the 
speaker. 



30 



35 



quency FIG. 4(a) depicts the case where all fre- 
quencies o)ni*'^* in frame K-h 1 lie outside a **matchiDg 
mterval" A of a»nS Le., 

for all m. In this case the frequency track associated 
with a>R^is declared "dead" on entering frame k+1. and 
oirt^ is matched to itself in frame k+1, but with zero 
amplitude. Frequency is then eliminated from fur- 
ther consideration and Step 1 is repeated for the next 
frequency in the list, atn+i^. 

If on the other hand there exists a frequency Om^"^^ in , 
frame k+1 that lies within the matching interval about 
<t>nK and is the closest such frequency, i.e., 

l««*-o)«*+l| < |a»i.*-o/*+» I <A (5) 

for aU i^^m, then cd^+im is declared to be candidate 
match to o^n* A definitive match is not yet made, since 
there may exist a better match in frame k to the fre- 



In this step, a candidate match from Step 1 is con- 
firmed. Suppose that a frequency <a^n of frame k has 
been tentatively matched to frequency fi>^+ of frame 
k+1 . Then, if ta^-^^m has no better to the remaining 
unmatched fi^quencies of frame k, then the candidate 
match is declared to be a definitive match. This condi- 



In a simple embodiment the speech waveform can be 45 quency f a contingency which is accounted for in 
digitized at a 10 kHz sampling rate, low-passed filtered Step 2. 
at 5 kHz, and analyzed at 20 msec frame intervals with 

a 20 msec Hammixig window. Speech representations ^ 
according to the invention can also be ob^ined by em- 
ploying an analysis window of variable duration. For so 
some appUcations it is preferable to have the width of 
the analysis window be pitch adaptive, being set, for 
example, at 2.5 times the average pitch period with a 

minimum width of 20 msec. 

Plotted in FIG. 2 is a typical periodogram for a frame 55 tion, illustrated in FIG. 4 (c), is given by 
of speech along with the amplitudes and frequencies 

that are estimated using the above procedure. The DFT l«Bi*+*-»n*l ' for i<n (€) 

was co mputed using a 512-point fast Fourier transform where the first bracketed value in Equation 6 is illus- 
(FFT). Different sets of these parameters will be ob- trated as <r2 in FIG. 4 and the second bracketed value of 
tamed for each analysis frame. To obtain a representa- 60 Equation 6 is illustrated as o-i. When this occurs, fre- 
tion of the waveform over time, frequency components queocies a)n* and <am^'^^ are eliminated from further 
measured on one fiame must be matdxed with those that consideration and Step 1 is repeated for the next fre- 
are obtained on a successive frame. quency in the list, ^n+x- 

FIG. 3 illustrates the basic process of frequency com- If the condition (6) is not satisfied, then the frequency 
ponent matchhig. If the number of peaks were constant 65 m frame k+,1 is better tnatr^ft^} to the frequency 

and slowly varying from frame to frame, the problem of cb^a+i in frame k than it b to the test frequency taj^. 
matrhtng the parameters estimated on one frame with Two additional cases are then considered. In the &st 
those on a successive frame would simply require a case, illustrated in FIG. 4(d), the a4jacent remaining 
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lower firequency a)*+^/n+i one exists) lies below the Since in the comprehensive system of FIG. 6 a set of 
matching interval, hence do match can be made. As a amplitodes* frequencies and phases are estimated for 
result, the frequency track associated with o>n^ is de- each frame, it might seem reasonable to estimate the 
cUred "dead" on entering frame k+1* and <oh^ is original speech waveform on the lc*th frame by generat* 
matched to itself with zero amplitude. In the second S ing synthetic speech using the equation* 
case, ilhistrated in FIO. 4(e), the frequency qa^+ - 1 is 

within the matching interval about ci^n and a definitive j<»)«1/-i^*U/* «■ [net/'+efy (t) 

match is made. After either case Step 1 is repeated uang 

the next frequency in the frame k list, a)„^. i. It should be ^^r kN<« ^ (k+ 1)N. Due to the time-varying nature of 
noted that many other situations are possible in this step, 10 parameters, however, this straightforward approach 
but to keep the tracker alternatives as simple as possible ^eads to discontinuities at the frame boundaries which 
only the two cases are discussed. seriously degrades the quality of the synthetic speech. 

Therefore, a method must be found for smoothly inter- 
polating the parameters measured from one frame to 

When all frequencies of frame k have been tested and ^^^^ that are obtained on the next 
assigned to continuing tracks or to dying tracks, there As a result of the frequency matching algorithm de- 
may remain frequencies in frame k+1 for which no scribed in the previous section, all of the parameters 
matches have been made. Suppose that a>^+^/n is one measured for an arbitrary frame k are associated with a 
such frequency, then it is concluded that QD^+^m was corresponding set of parameters for frame k+1. Letting 
••bom" in frame k and its match, a new frequency, ^ [A/*, <a^, 9fl and [Ai*-»-*. »/*+l]dcnote the suc- 

0)^+ ^m, is created in frame k with zero magnitude. This cessive sets of parameters for the Tth frequency track, 
is done for all sxich unmatched frequencies. This last an obvious solution to the amplitude interpolation 

step is illustrated in FIO. ^f). problem is to take 

The results of applying iJie tracker to a segment of 
real speech is shown in FIG. 5, which demonstrates the (^^^j _ ^jt^ (g) 

ability of the tracker to axlapt quickly through transi* ^ « 

tory speech behavior such as voiced/unvoiced transi- 
tions, and mixed voiced/unvoiced regions. where n= 1 A . . . , N is the time sample into the k'th 

In the smiple •'magnitude-only" system, synthesis is ^ frame. (The track subscript "1" has been omitted for 
accomplished in a straightforward manner. Each pair of convenience). 

match frequencies (and their corresponding magni- Unfortunately such a simple approach cannot be used 
tudes) are linearly interpolated across consecutive to interpolate the frequency and phase because the mca- 
frame boundaries. As noted above, in the magnitude- sured phase, is obtained modulo 2 tt. Hence, phase 
only system, phase continuity is guaranteed by the defii- 35 unwrapping must be performed to insure that the frc- 
nition of phase in terms of the instantaneous frequency. quency tracks are "maximally smooth" across frame 
The interpolated values are then used to drive a sine boundaries. The first step in solving this problem is to 
wave generator which yields the synthetic waveform as postulate a phase interpolation function that is a cubic 
shown in FIG. 1. It should be noted that performance is polynomial, namely 
improved by reducmg the correlation window size. A, 4^ 

at higher frequencies. BiO^i+yt+afi-^-fit^ (9) 

A further feature shown in FIG. 1 (and d^^iT?sfd in 
detail below) is that the present invention is ideally It is convenient to treat the phase function as though it 
suited for performing time-scale modification. From were a function of a continuous time variable t, with 
FIG. 3 it can be seen that by simply expanding or com- 45 t=0 corresponding to frame k and t=T corresponding 
pressing the time scale, the locations and magnitudes are to frame k4- 1. The parametiers of the polynomial must 
preserved while modifying their rate of change in time. ^ chosen to satisfy the frequency and phase measure- 
To effect a rate of change b, the synthesizer interpola- ments obtained at the frame boundaries. Since the in- 
tion rate R' (see FIG. 1) b given by R'=bR. Further- stantaneous frequency is the derivative of the phase, 
more, with this system it is straightforward to invoke a so tben 
time-varying rate of change since frequencies may be 

stretched or compressed by varying the interpolation ^0-T+2ar+3^«2 (lO) 

rate in time. 

FIG. 6 shows a block diagram of a more comprehen- ^ follows that at the starting point, t=0, 
sive system in which phases are measured directly. As 55 of/ji-e-fl* 

shown in FIG. fi, the more comprehensive system 30 nn 
mcludes a sampling window 32, a discrete Fourier 

transform (DFT) analyzer 34, peak estimator 36, and and at the terminal point, t=T 

phase calculator 38. m the analysis section, and a cubic ti ^(7)=6*-ha)*T+aT^H-OT3=e*+iH-27rA/ 

phase interpolator 40, a linear amplitude interpolator 42, 60 W=«^+2aT+ 3/3x^=0*+* (12) 

a sine wave generator 44, amplitude modulator 46 and 

summer 48 in the synthesis section. In this system the where again the track subscript "1" is omitted for con- 
frequency components and their amplitudes are deter- venience, 

mined in the same manner as the magnitude-only system Since the terminal phase 6^-^ ^ is measured modulo 
described above and illustrated in FIG. 1. Phase mea- 65 27r, it is necessary to augment it by the term 2irM (M is 
surements, however, are derived directly from the dis- an mtcgcr) in order to make the resulting fi^uency 
Crete Fourier transform by computmg the arctangents function **maximally smooth". At this point M is un- 
at the estimated frequency peaks. known, but for each value of M, whatever it may be. 
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(12) can be solved for a(M) and i3(M), (the dependence As a result of the above phase unwrapping proce- 

on M has now been made explicit). The solution is easily dure» each frequency track viiSi have associated with it 

shown to satisfy the matrix equation: an instantaneous unwrapped phase which accounts for 

both the rapid phase changes due to the frequency of 

(13) ^ each sinusoidal component, and the slowly varying 

3 p - phase changes due to the glottal pulse and the vocal 

[cm 1^ ^ U*+» - - «*r+ 2trM track transfer function. Letting 0/(t) denote the un- 

^(Ao J J wrapped phase function for the I'th track, then the final 

7^ 7^ . synthetic waveform will be given by 

10 

In order to determine M and ultimately the solution K«)-2/=i«*>a^(«) cos Mn)] (18) 

Of cubic p^mterpoktionfm^^ The mvention as described in connection with FIG, 6 

values of M. It seems dear on mtmtive grounds that the t. u « «wviiucu m vwiuicwuuii wiw ri^j, o 

best phase function to pick is the one Sat would have ^ ^ ^ 

thel^tvariation.Tliisiswhatismeantby ama«m^ ''P^^^^ J°^°^'^ P?,^^^'*' rate, high- 
smooth frequency track. In fact, if the frequencies weri 20 *l"^*y«Pf^<*«Pf"<^«JJ^^ 

constant and the vocal tract were stationary, the true ^ ^^""^ md^ ihm, phase codmg is a high pnority. Smce 

phase would be linear. TTierefore a reasonable criterion s^usoidal r^r^ntation also requires the specifica. 

for "smoothness" is to choose M such that P^^l amphtodes and frequwiaes, i is clear that 

relatively few peaks can be coded before all of the avail- 

, xA0=/«1^tAO ? fit (1*) vs&d. The first step, therefore, is to signifi- 

^ cantly reduce the number of parameters that must be 

is a minimmn, where 4^(^M) denotes second derivative coded. One way to do this is to force all of the frequen- 

of d(yil) with respect to the time variable t cies to be harmonic. 

Aldiough M is integer valued, since f(M) is quadratic During voiced speech one would estpect all of the 

in M, the problem is most easQy solved by mtnimiging peaks to be harmonically related and therefore, by cod- 

f(x) with respect to the continuous variable x and then u^g the fundamental, the locations of all of the frequen- 

choostng M to be the integer closest to x. After straight- cies will be available at the receiver. During unvoiced 

forward but tedious algebra, it can be shown that the speech the frequency locations of the peaks will not be 

minimizing value of z is harmonic in this case. However, it is well known from 

random process theory that noise-like waveforms can 

r (15) ^ represented (in an ensemble mean*$quared error 

X* = [(9* + tf^r - ^-t-i) 4- (Q>Jt+i _ sense) in terms of a harmonic expansion of sine waves 

' provided the spacing between adjac^t harmonics is 

i« AA*^—i^^ /«\ soiall enough that there is little change in the power 

from which M* is determmed and used m (13) to com- n - ;«4^*.,.i. *i™ inn 

pute a(Nf) and ^(M*). and in turn, the unwrapped 40 envelope^e. mtervals 1^ than about 100 

phase interpolation f^mction ^ 1^ repres«itation preserve ^e statistica^ prop- 

^ erties of the mput speech provided the amphtudes and 

tf(O=^+<0*f+a(M*)ti+/5(M»)t3 (16) phases are randomly varying from frame to frame. 

Since the amplitudes and phases are to be coded, this 

This phase function not only satisfies all of the measured 45 random variation inherent m the measurement variables 

phase and frequency endpoint constraints, but also un- can be preserved in the synthetic waveform, 

wraps the ph^ in such a way that ^(t) b maximally ^ * practical matter it is preferable to estfanate the 

smooth. fundamental frequency that characterizes the set of 

Since the above analysis began with the assumption frequencies in each frame, which in turn relates to pitch 

of an initial unwrapped phase 0^ corresponding to fre- SO extraction. For example, pitch extraction can be accom- 

quency at the start of frame k, it is necessary to Pushed by selecting the ftindamental frequency of a 

specify the initialization of the frame interpolation pro- harmonic set of sine waves to produce the best fit to the 

cedure. This is done by noting that at some point in time *^P^* waveform accordmg to a perceptual criterion, 

the track under study was bom. When this event oc- Other pitch extraction techniques can also be employed, 

curred, an amplitude, frequency and phase were mea- 55 As an immediate consequence of using the harmonic 

sured at frame k-Hl and the parameters at frame k to frequency model, it follows that the number of sine 

which these measurements correspond were defined by "^sve components to be coded is the bandwidth of the 

setting the amplitude to zero (Le., A^^O) while main- coded speech divided by the fundamental. Since there is 

taining the same frequency (ie., Q>^=a>^+^). In order to guarantee that the number of measured peaks will 

insure that the phase interpolation constraints are satis- ^ equal this harmonic number, provision should be made 

fied initially, the unwrapped phase is defined to be the ^cr adjusting the number of peaks to be coded.' Based on 

measured phase 0^+^ and the start-up phase is defined ^c fundamental, a set of harmonic frequency bins are 

to be established and the number of peaks falling within each 

bin are examined. If more than one peak is found, then 

e*=te*+i_«fc+iN (17) 65 only the amplitude and phase corresponding to the 

largest peak are retained for coding. If there are no 

where N is the number of samples traversed in going pealcs in a given bin, then an artificial peak is created 

from frame k-Hl back to frame k. having an amplitude and phase obtained by sampling 
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the short-time Fourier Transform at the frequency cor* 72 and summer 74. In this illustration, the representative 
responding to the center of the bin. sine waves are further defined to consist of system con- 

The amplitudes are then coded by applying the same tributions (jue. from the vocal tract) and excitation con- 
techniques used in channel vocoders. That is, a gain tributions ^e. from the vocal chords). The excitotion 
level is set, for example, by using 3 bits with 2 dB per 5 phase contributions arc singled out for cubic interpola- 
level to code the ampUtude of a first peak (Le. the first don. The procedure generally foUows that described 
peak above 300 Hz). Subsequent peaks are coded loga- above in connection with other embodiments; however, 
nthmically usmg ddta-modulation techniques across hi a fiirther step the measured ampUtudes A/* and phases 
frequOTcy. In one simulation 3.6 kbps were assigned to e^m decomposed into vocal tract and excitation com- 
(»de the amphtudes at a 50 Hz frame rate. Adaptive bit 10 ponents. The approach is to first form estimates of the 
allocation nUes am used t^ vocal tract amplitude and phase as functions of fre- 

example, ifthe pitch is high there will be relatively few q^^y at each analysis frame (Lc, M(o), kR) and *(o>, 
peaks to code^ and there will be more bits P«r peak. y^)). System ampUtude and phase estimates at thTse^ 
Converaely when the pitch is low there wdl be rek- lected frequencies oi,* are th^ given by: 
tivdy few bits per peak, but since the peaks will be 15 &* ^ 

closer together their values will be more correlated, M^^iii{ta^, kR) (19) 

hence the ADPCM coder should be able to track them . and 

wen. kR) (20) 

To code the phases a fixed number of bits per peak Fmally, the excitation parameter estimates at each anal- 
(typically 4 or 5) is used. One method for codmg the 20 ysis fi:ame boundary are obtained as 
phases Is to assign the measured phase to one of 2" equal 

subdivisions of — ir to ir region, where n=4 or 5. An- af^A^/Mf (2i) 

other method uses the frequency track correspondmg to jt j, ^ 

the phase (to be coded) to predict the phase at the end fl/*-«/*-*/* (22) 

of the current frame, unwrap the value, and then code 25 . 

the phase residual using ADPCM techniques with 4 or decomposition problem then becomes that of 

5 bits per phase peak. Since there remains only 4.4 kbps estimating M(«, kR) and <&(«. kR) as fimctions of frc-' 
to code tiie phases and the fundamental (7 bits are used)» tP^cncy from the high resolution spectrum X(6),kR). (In 
then at a 50 Hz frame rate, it wiU be possible to code at Practice of course, uniformly spaced frequency samples 
most 16 peaks. At a 4 kHz speech bandwidtii and four 30 ^® available from the DPT.) There exist a number of 
bits per phase, ail of the phases will be coded provided established ways for separating out the system magni- 
thc pitch is greater than 250 Hz. If the pitch is less than ^® high-resolution spectrum, such as all-pole 

250 Hz provision has to be made for regenerating a modeling and homomorphic deconvolution. If the 
phase track for the uncoded high frequency peaks. This transfer function is assumed to be minimum 

is done by computing a dififerential frequency that is the 35 P^^^ the logarithm of the system magnitude and 
difference between the derivative of the mstantaneous system phase form a Hilbert transform pair. Under 

cubic phase and the linear interpolation of the end point ^ condition, a phase estimate <^(a),kR) can be derived 
frequencies for that track. The differential frequency is ^ logarithm of a magnitude estimate M (a),kR) of 

translated to the high frequency region by adding it to system function through the Hilbert transform. Fur- 

the linear interpolation of the end point frequencies 40 tiiermore, the resulting phase estimate will be smooth 
corresponding to the track of the uncoded phase. The ^ unwrapped as a fraction of frequency, 
resulting instantaneous frequency function is then inte- ^e approach to estimation of the system magnitude, 
grated to give the instantaneous phase function that is and the corresponding estimation of the system phase 
applied to the sine wave generator. In this way the through the use of the Hilbert Transform is shown in 
phase coherence intrinsic in the voiced speech and the 45 9 and is based on a homomorphic transformation, 
phase incoherence characteristic of unvoiced speech b ^ FIG, 9, a homomorphic analysis system 90 is shown 
effectively translated to the uncoded frequency regions. consisting of a lo garithmic operator 92, a fast Fourier 

In FIG. 8 another embodiment of the invention is transform (FFT) calculator 94. a right-sided window 
shown, particularly adapted for time-scale modifica- ^» an inverse FFT calculator 96 and an exponential 
tion. As shown in FIG. 8, the time-scale modification 30 operator 98. In this technique, the separation of the 
system 50 i nclude s a sampling window 52, a fast Fourier system amplitude from the high-resolution spectrum 
transform (FFT) analyzer 54, a system contribution and the computation of the Hilbert transform of this 
estimator 56, an excitation magnitude estimator 58, an amplitude estimate are in effect performed simulta- 
excitation phase calculator 60, a linear mterpolator 62 neously. The Fourier transform of tihe logarithm of the 
(for interpolating the system "magnitudes" and **pha- 55 high-resolution magnitude is first computed to obtain 
sea>\ as well as the excitation **magnitudes" of the spec- the ''cepstrum". A right-sided window, with duration 
tral components from frame-to-frame), and a cubic in- proportional to the average pitch period, is then ap- 
terpolator 64 (for interpolating the excitation phase plied. The imaginary component of the resulting inverse 
values from frame-to-frame). The system 50 also in- Fourier transform is the desired phase and the real part 
eludes a peak detector 68 and frequency matcher 68 60 is the smooth log-magnitude. la practice, uniformly 
which control the interpolators 62 and 64 in a manner spaced samples of the Fourier transform are computed 
malogous to the techniques discussed above in connec- with the FFT. The length of the FFT was chosen at 512 
tion with the other embodiments. which was sufficiently large to avoid aliasing in the 

Time-scale modification is achieved by rate control- cepstrum. Thus, the high-resolution spectrum used to 
ler 70 winch provides adjustments to the rate of interpo- 65 estimate the sihewavc frequencies is also used to esti- 
lation in interpolators 62 and 64 to slow down or speed mate the vocal-tract system function, 
up the processing of the waveforms. The modified The remaining analysis steps in the time-scale modify- 
wavef orms arc then synthesized by sine wave generator ing system of FIG. 8 are analogous to those described 
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above in connection with the other embodiments. As a where L (n) is the nvmber of sine waves estimated at 

resuh of the matching algorithm, aU of the amplitudes time n. The required values in equation (23) are ob- 

and phases of the exdtadon and system components tained by simply scaling A^t), fl/(t) and 4>i(t) at a time 

measured for an arbitrary frame k are associated with a p~ ^n and scaling the resulting excitation phase by p- ^ 

corresponding set of parameters for frame The S With the proposed time-scale modification system, it 

next step in the synthesis is to interpolate the matched is also straightforward to apply a time- varying rate 

excitation and system parameters across firame bound- chang^i. Here the time-warping transformation is given 

aries. The interpolation procedures are based on the by 
assumption that the excitation and system functions are 

slowly-varying across frame boundaries. This is coiisis> 10 ^o-'^TO-Xo'^TVr (24) 
tent with the assumption that the model parameters are 

slowly-varying relative to the duration of the vocal where p(r) is the desired time-varying rate change. In 

tract imputee respCMise. Since tbU slowly*varying con- generalization, each time-difiTerential dT is scaled 

straint maps to a slowly-varying excitation and systeo^ ^ different factor p(T). Speech events which take 

amplitude, it sufGces to interpolate these functions lin- place at a time to in the new time scale will now occur 

early. a time to'^W^Hto) in the original time scale. If to 

Since the vocal tract system is afiffumed slowly-vary- maps back to to', then one approximation is given by: 
ing over consecutive frames, it is reasonable to assume 

that its i^iase is slowly-varying as well and thus linear 'i'*^'+p~ M (25) 

interpolation of the phase samples will also suffice. ^ o- ^i. ^ . . i 

However, the characteristic of "slowly-varying" is Sm-ie ^e parameters of Ae sinusoidal components are 

more difficult to achieve for the system phase than for avai^J-We as contmuous functions of time, they can al- 

the system magnitude. This is because an additional ways he found at the required ti'. 

constraint must be imposed on the measured phase; Lf^trng t« denote the inverse to time t„=n, the syn- 

namely that the phase be smooth and unwrapped as a 2f thetic waveform is then given by: 

function offrequency at each frame boundary. There it w^.-t, M^U^t -^n/r, ^^a>^t oici 

is shown that if the system phase is obtained luodulo 2ir >^Kr»Ocos[fV(i,o+«<f„0] (26) 

then linear interpolation can result in a (falsely) rapidly- where 
varying system phase between frame boundaies. The 

importance of the use of a homomorphic analyser of ^ <V(n)-n/(rt-i)+a»/(^B') (27) 

nc. 9 is now evident The system phase estimate de- a^d 
rived from the homomorphic analysis is unwrapped in. 

frequency and thus slowly-varying when the systun V (28) 
amplitude (from which it was derived) is slowly-vary- 
ing. Linear interpolation of samples of this function where a>i(t) is a quadratic function given by the first 
results then in a phase trajectory which reflects the deri/at'.ve of the cubic phase function fl<t). 
underlying vocal tract movement This phase function where 
is referred to as ^^t) where ^i(p) conespcmds to the 

<!»/* of Equation 22. Finally, as before, a cubic polyno- ^'^^^ ^^'> 
mial is employed to interpolate the excitation phase and ^ 

frequency. This will be referred to n/t) where Orfo) ^ * particular track is bom, the cubic phase 

corresponds to O/* of Equation 22. function n/(n) is initialized by the value p(t„')n<t„') 

The goal of time-scale modification is to maintain the ^'^^^ ^'(^"'^ ^ ^® excitation phase obtained 
perceptual quality of the original speech while chang- „ , ^ . ^ ^ *^ . 
ing the apparent rate of articulation. This impUes tiiat ^ ^ appreciated tiiat the mvention can be 
tiie frequency trajectories of tiie ^citation (and thus the ^ *° perform frequency and pitch scaling. The short 
pitch contour) are stretched or compressed in time and spectral envelope of tiie synthetic waveform can 
tiie vocal tract changes at a slower or faster rate. The scaMg each frequency component and tiie 
syntiiesis metiiod of tiie previous section is ideaDy °^ ^ syntiietic waveform can be altered by seal- 
suited for diis transformation since it involves summing » exdtationKMntributed frequency components, 
sme waves composed of vocal cord excitation and vocal ^ * embodiment lOQ of tiie invention is 
tract systOTi contributions for which explicit functional ^ implemented and operated in 
expressions have been derived. .^eal time. Tlie illustrated embodiment was implemented 

Speech events which take place at a time toaccording in 16-bit fixed point aritiunetic using four Lincobi Digi- 
to tiie new time scale will have occured at p-^to in tiie ^ Processors (LDSPs). Tlie foreground pro- 
original time scale. To apply tiie above sine wave model «^ operates on every input A/D sample coUecting 
to time-scale modification, tiie "events" which are time- ^P^* ^P®^^ samples into 10 msec buffers 102. At 
scaled are tiie system amplitudes and phases, and tiic the same time a 10 msec buffer of syntiicsized speech is 
excitation ampUtudes and frequencies, along each fre- P^V^ tiirough a D/A converter. At the end of 
quencytracL Since tiie parameter estimates of tiie un- ® each frame, tiic most recent speech is pushed down into 
modified syntiiesis are available as continuous ftaictions * ^ ^ 104. It is fix)m tins buffer tiiat tiie data 
of time, tiien in tiieory, any rate change is possible. In ^« pitch-adaptive Hamming window 106 is drawn 
conjunction witii tiie Equations (19H22) tiie time " * P°""* Fourier Transform 
scaled syntiietic waveform can be expressed as: ^ appUed by FFt calucoator 108. Next a set of 

65 amplitudes and frequencies is obtained magnitude esti- 

j'(R)sX/=i^'')i4j(p-iff- mator 110 and peak de tector 112 by locating the peaks 

)costn/(p-»i!)/p-i+<i>i(p-i/i)](23) of the magnitude of the FFT. The data is supplied to the 

pitch extraction module 114 from which is generated 
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the pitch estimate that controls the pitch-adaptive win- 9. The method of claim 1 wherein the step of interpo- 

dows. This parameter is also supnlicd to the coding ladng values further includes defming a series of instan- 

module 116 in the data compression application. Once taneous frequency values by interpolating matched 

the pitch has been estimated another pitch adaptive frequency components from the one frame to the next 

Hamming window 118 is buffered and the data trans- 3 frame and then integrating the series of instantaneous 

ferred by I/O operator 120 to another LDSP for paral* frequency values to obtain a series of interpolated phase 

lei computation. Another 512 point FFT is taken by values. 

FFT calculator 122 for the purpose of estimating the 10. The method of claim 1 wherein the step of inter- 
amplitudes, frequencies and phases, to which the coding polating further includes deriving phase values from 
and speech modification methods will be applied. Once 10 frequency and phase measurements taken at each frame 
these peaks have been detennined the frequency track- and then interpolating the phase measurements, 
ing and phase interpolation methods are implemented. 11. The method of claim 1 wherein the step of inter* 
Depending upon the application, these parameters polating is achieved by performmg an overlap and add 
would be coded by coder 116 or modified to effect a function. 

speech transformation and transferred to another pair of 12. The method of claim 1 wherein the method fiir- 

LDSPs, where the sum of sine waves synthesis is in^le- ther includes coding the frequency components for 

mented. The resulting synthetic waveform is then trans- digital transmission. 

ferred back to the master LDSP where it is put into the 13. The method of claim 12 wherein the frequency 

appropriate buffer to be accessed by the foreground components are limited to a predetermined number 

program for D/A output. ^ defined by a plurality of harmonic frequency bins. 

We claim: 14. The method of claim 13 wherein the amplitude of 

1. A method of processing an acoustic wavefonn, the only one of said components is coded for gain and the 
method comprismg: amplitudes of the others are coded relative to the ncigh- 

sampling tiie waveform to obtain a series of discrete boring component at the next lowest frequency, 

samples and constructing therefrom a series of 15. The method of claim 12 wherein the phases are 

frames, each frame spanning a plurality of sample^ coded by applying pulse code modulation techniques to 

analyzing each frame of samples to extract a set of a predicted phase residual, 

variable frequency components having mdividual 16. The method of claim 12 wherein high frequency 

amplitudes; ^ regeneration is applied. 

matching said variable components from one frame to 17. The method of claim 1 wherein the method fur- 

a next frame such that a component in one frame is ther comprises constructing a synthetic waveform by 

matched with a component in a successive frame generating a series of constituent sine waves corre- 

that has a similar value regarless of shifts in fre- spending in frequency and amplitude to the extracted 

quency and spectral energy; and 3^ components. 

mterpolating the matched values of the components 18. The method of claim 17 wherein the time-scale of 

from the one frame to the next frame to obtain a said reconstructed waveform is varied by changing the 

parametric representation of the waveform rate at which said series of constituent sine waves are 

whereby a synthetic waveform can be constructed interpolated. 

by generating a set of sine waves corresponding to 4Q 19. The method of claim 18 wherein the time-scale is 

the interpolated values of the parametric represen- continuously variable over a defined range. 

tation. 20,Themethodofclami 17 wherein the pitch of the 

2. The method of claim 1 wherein the step of sam- synthetic waveform is varied by adjusting the fre- 
pling fteher includes determining a pitch period for quency of each frequency component while maintain- 
said waveform and varying the length of the frame in 45 ing the overall spectral envelope. 

accordance with the pitch period, the length being at 21. The method of claim 1 wherein the method fur- 
least twice the pitch period of the waveform. ther comprises constructing a synthetic waveform by 

3. The method of claim 2 wherein the step of sam- generating a series of constituent sine waves corre- 
pHng further includes sampling the waveform accord- spending in frequency, amplitude, and phase to the 
ing to a pitch-adaptive Hamming window. 50 extracted components. 

4. The method of clahn 1 wherein the step of analyz- 22. The method of claim 21 wherein the time-scale of 
ing further includes analyzing each frame by Fourier said reconstructed waveform is varied by changing the 
^"^ysi** rate at which said s«ies of constitutent sine waves arc 

5. The method of claim 1 wherein the step of analyz- interpolated. 

ing further includes selecting a harmonic scries to ap- 55 23. The method of claim 22 wherein the time-scale is 

proximate the frequency components. continuously variable over a defined range. 

6. The method of claim 5 wherein the step of select- 24. The device of claim 22 wherein the device further 
iiig a harmonic scries further includes determining a comprises means for constructing a synthetic waveform 
pitch period for the waveform and varying the number by generating a series of constituent sine waves corre- 
of frequency components in the harmonic series in ac- 60 spending in frequency and amplitude to the extracted 
cordance with the pitch period of the wavefonn. components. 

7. The method of claim 1 wherein the step of tracking 25. The device of claim 24 wherein the device further 
further includes matching a frequency component from includes means for varying the time-scale of said recon- 
the one frame with a component in the next frame hav- structcd wavefonn by changing the rate at which said 
ing a snnilar value. 65 series of constituent sine waves are interpolated. 

8. The method of clann 7 wherein said matching 26. The device of claim 25 wherein the mwing for 
further provides for the birth of new fi^uency compo- varying the time-scale is continuously variable over a 
nents and the death of old frequency components. defined range. 
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27. The device of claim 24 wherein the constituent components and the death of old firequency compo- 
sine waves are f\irther defined by system contributiotts nents. 

and excitation contributions and wherein the means for 40. The device of claim 38 wherein the frequency 

varying the time-scale of said reconstructed waveform components are limited to a predetermined number 

further includes means for changing the rate at which S defined by a plurality of harmonic frequency bins, 

parameters defining the system contributions of the sine 41. The device of claim 40 wherein the amplitude of 

waves are interpolated. only one of said components is coded for gain and the 

28. The device of daim 27 wherein the device further amplitudes of the others are coded relative to the neigh- 
includes a scaling means for scaling the frequency com- boring component of the next lowest frequency, 
ponents. 10 4Z The device of claim 32 wher^ the interpolating 

29. The device of claim 1/ wherein the device frirtfaer means further includes means defining a series of instan- 
includes a scaling means for scaling the excitation-con* taneous frequency values by interpolating matched 
tributed frequency components. frequency components from the one frame to the next 

30. The method of claim 21 wherein the constituent frame and means for integrating the series of instanta- 
sine .vaves are further defined by system contributions IS neousfrequency values to obtain a series of interpolated 
and excitation contributions and wherein the time-scale phase values. 

of said reconstructed waveform is varied by changing 43. The device of claim 32 wherein the interpolating 
the rate at which parameters defining the system contri- means further includes means for deriving phase values 
buttons of the ^e waves are interpolated. from the frequency and phase measurements taken at 
3L The method of claim 30 wherein the pitch of the 20 each frame ami then interpolating the phase measure- 
synthetic waveform is altered by adjusting iht frequen- ments. 

cies of the excitation-contributed frequency compo- 44. The delvice of claim 32 wherein the interpolating 

nents while maintaining the overall spectral envelope. means further includes means for performing an overlap 

32. A device for processing an acoustic waveform, and add function. 

the device comprising: 25 45. The device ofclaim 32 wherdn the device further 

sampling means for samjpling the waveform to obtain includes coding means for coding the frequency compo- 

a series of discrete samples and constructing there- nents for digital transmission, 

from a series of firames, each frame spanning a 46. The device of claim 45 wherein the coding means 

plurality of sample^ fUrtber comprises means for applying pulse code modu- 

analyzing means for analyzing each frame of samples 30 lation techniques to a predicted ptu»e residual, 

to extract a set of variable frequency componoits 47. The device of claim 45 wherein the coding means 

having individual amplitudes; further comprises means for generating high frequency 

matching means for matching said variable compo- components, 

nents from one fi^e to a next frame such that a 48. The device of claim 32 wherein the device further 

component in one frame is matched with a compo- 33 comprises means for constructing a synthetic waveform 

neat in a successive frame that has a similar value by generating b series of constitutent sine waves corre- 

regardless of shifts in frequency and spectral en- spending in frequency, amplitude, and phase to the 

erg^^and extracted components. 

interpolating means for interpolating the matched 49. The device of claim 48 wherein the device f^irther 

values of the components from the one frame to the 40 includes means for varying the time-scale of said recon- 
next frame to obtain a parametric representation of structed waveform by changing the rate at which said 

the waveform whereby a synthetic waveform can series of constituent sine waves are interpolated, 

be constructed by generating a set of sine waves 50. The device of claim 49 wherein the means for 

corresponding, to the interpolated values of the • varying the time*scale is continuously variable over a 

parametric representation. 43 defined range. 

33. The device of claim 32 wherein the sampling 51* A coded speech transmission system comprising: 
means further includes means for constructing a frame sampling means for sampling a speech waveform to 
having variable length, which varies in accordance with obtain a series of discrete samples and for con- 
the pitch period, the length being at least twice the pitch structing thBrefit)m a series of frames, each frame 
period of the waveform. SO spanning a plurality of samples; 

34. The device of claim 32 wherein the sampling analyzing means for analyzing each frame of samples 
means farther includes means for sampling according to by Fourier analysis to extract a set of variable fre- 
a Hamming window. quency components having individual ampUtode 

35. The device of claim 32 wherein the analyzing values; 

means further includes means for analyzing each frame 55 coding means for coding the component values; 

by Fourier analysis. decoding means for decoding the coded values after 

36. The device of claim 32 wherein the analyzing transmission and for reconstituting the variable 
means further includes means for selecting a harmonic components; 

series to {^proximate the frequency components. matchhig means for matching the reconstituted, vari- 

37. The device of claim. 36 wherein the number of 60 able components from one frame to a next frame 
frequency components in the harmonic series varies such that a component is one frame is matched 
according to the pitch period of the waveforoL with a component m a successive frnme that has a 

38. The device of claim 32 wherein the tracking similar value regardless of shifts in frequency and 
means further includes means for matdnng a frequency spectral energy and 

component from the one frame with a component in the 65 interpolation means for interpolating the values of the 

next frame, having a similar value. frequency components from the one frame to the 

39. The device of claim 38 wherein said matMn^ aext frame to obtain a representation of the wave- 
means further provides for the birth of new firequency form whereby synthetic speech can be constructed 



/ 
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by generating a set of sine waves corresponding to 
the interpolated values of the parametric represen- 
tation. 

52. The device of claim 51 wherein the coding means 
further includes means for selecting a harmonic series of 
bins to a^jproximate the frequency components and the 
number of bins varies according to the pitch of the 
wavefonn. 

53. The device of claim 51 wherein the amplitude of 
only one of said components is coded for gain and the 
amplitudes of the other components are coded relative 
to the neighboring component at the next lowest £re- 
quency. 

54. The device of claim 51 wherein the amplitudes of 
the components are coded by Unear prediction tech- 
niques. 

55. The device of claim 51 wherein the amplitudes of 
the components are coded by adaptive delta modulation 
techniques. 

56. The device of claim 51 wherem the analyzing 
means further comprises Tnwwig for measuring phase 
values for each frequency component 

57. The device of claim 56 wherein the coding means 
further includes means for coding the phase values by 
applying pulse code modulations to a predicted phase 
residual 

58. A device for altering the time-scale of an audible 
waveform, the device comprising; 

sampling means for sampling the waveform to obtain 
a series of discrete samples and constructing there- 
from a series of frames, each frwe spanning a 
plurality of samples; 

analyzing means for analyzing each frame of samples 
to extract a set of variable frequency components 
having individual amplitudes; 

matching means for matching said variable compo- 
nents from one frame to a next frame such that a 
component in one frame is matched with a compo- 
nent in a successive frame that has a similar value 
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regardless of shifts in frequency and spectral en- 
ergy; 

interpolating means for interpolating the amplitude 
and frequency values of the components from the 
3 one frame to the next fr^e to obtain a representa- 
tion of the waveform whereby a synthetic wave- 
form can be constructed by generating a set of sine 
waves corresponding to the interpolated represen- 
tation; 

10 interpolation rate adjusting means for altering the 
rate of interpolation; and 
synthesizing means for constructing a time-scaled 
synthetic waveform by generating a scries of con- 
stituent sine waves corresponding in frequency and 

15 amplitude to the extracted components, the sine 
waves being generated at said alterable interpola- 
tion rate. 

59. The device of claim 58 wherein the interpolation 
rate adjusting means is continuously variable over a 
20 defined range. 

€0, The device of claim 58 wherein the analyzing 
means further comprises means for measuring phase 
values for each frequency component 

61, The device of claim 60 wherein the component 
25 phase values are interpohited by cubic interpolation. 

62. The device of claim 60 wherein the interpolation 
rate adjusting means is continuously variable over a 
defined range and further includes means for adjusting 
the rate of phase value interpolations. 

30 63. The device of claim 60 wherein the device further 
comprises means for separating the measured frequency 
components into system contributions and excitation 
contributions and wherein the interpolation rate adjust- 
ing means varies the time-scale of the synthetic wave- 

35 form by altering the rate at which values defining the 
system contributions are interpolated. 

64. The device of claim 63 wherein the interpolation 
rate adjusting means alters the rate at which the system 
amplitudes and phases and the excitation amplitudes 

40 and frequencies are interpolated. 

* « * * * 
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UNITED STATES PATENT AND TRADEMARK OFFICE 

CERTIFICATE OF CORRECTION 

PATENTNO. : 4,885,790 

DATED December 5, 1989 Page 1 of 3 

INVENTOR(S) : ^^^^^^ j McAulay et al. 

It is certified that error appears in the above-identified patent and that said Letters Patent is hereby 
corrected as shown below: 

column 6, lines 16 and 17, replace the three 
appearances of "K*!" with — K+1— . 

column 6, line 42, replace ".uK+V with -o)n,K+l— . 
Column 6, line 43, replace with — un^--. 

Column 6, line 45, replace "(o^+V with ~»m^+^— • 

Column 6, line 51, replace with — um^+l— . 

Column 6, line 52, replace "«K+V with 
Column 6, line 57, replace "icn" with — i^n— . 

column 6, line 65, replace "«K+V with — «n,K+l--. 



Column 7, line 18, replace 
Column 7, line 19, replace 
Column 7, line 21, replace 



-"K+^m- 


with 






with 


--«mK^l— 




with 


---m^^l— 



10/30/2003, EAST Version: 1.4.1 



UNITED STATES PATENT AND TRADEMARK OFFICE 

CERTIFICATE OF CORRECTION 

PATENT NO. : 4,885,790 

DATED : December 5, 1989 Page 2 of 3 

IMVENTOR(S) : « ^ ^ , »i- al 

Robert J. McAulay et ai. 

It is certified that error appears in the above-identified patent and that said Letters Patent is hereby 
corrected as shown below: 

Column 8, line 7, replace - I i=iL(^>" with 2 YJl^ 
Column 8, line 21, replace "I'th" with — fi,'th~. 
Column 8, line 26, replace "Ar+i" with — aK+1--. 
Column 8, line 29, replace "I'th" with — SL'th— . 
Column 8, line 51, replace "e(t)" with — e(t) — . 
Column 8, line 56, replace "e(t)" with — e(t)— . 
Column 8, line 59, replace -tie(t)" with --e(t)--. 
Column 8, line 60, replace "e(t)- with — e(t)--. 
Column 8, line 62, replace "1" with ~ Jt— . 
Column 10, line 8, replace "I'th" with — 2,'th--. 
Column 10, line 11, replace " I i=i^<^>" with 2 \ 
Column 13, line 66, Equation (23) should read 
— S'(n) = 2 ^i?> Aa(p-ln) cos tfiit(p-ln)/p-l + *jt(p-ln)l — , 
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UNITED STATES PATENT AND TRADEMARK OFFICE 

CERTIFICATE OF CORRECTION 

PATENT NO. : 4,885,790 Page 3 of 3 

DATED : December 5. 1989 

INVENTOR(S) : Robert J. McAulay, et al 

It ts cmified that error appears in the above-identif ied patent and that said Letters Patent is hereby 
corrected as shown bdow: 

Column 15, line 34, replace "regarless" with — regardless — • 



Signed and Sealed this 
Fourth Day of June, 1991 



Attest: 

HARRY R MANBECK, JR. 
Attesting Officer Commissioner of Patents and Tmdtrmarks 
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